Cost-Sensitive Decision Trees with Pre-pruning
نویسندگان
چکیده
This paper explores two simple and efficient pre-pruning strategies for the cost-sensitive decision tree algorithm to avoid overfitting. One is to limit the cost-sensitive decision trees to a depth of two. The other is to prune the trees with a pre-specified threshold. Empirical study shows that, compared to the error-based tree algorithm C4.5 and several other cost-sensitive tree algorithms, the new cost-sensitive decision trees with pre-pruning are more efficient and perform well on most UCI data sets.
منابع مشابه
CC4.5: cost-sensitive decision tree pruning
There are many methods to prune decision trees, but the idea of cost-sensitive pruning has received much less investigation even though additional flexibility and increased performance can be obtained from this method. In this paper, we introduce a cost-sensitive decision tree pruning algorithm called CC4.5 based on the C4.5 algorithm. This algorithm uses the same method as C4.5 to construct th...
متن کاملDecision Tree Pruning Using Expert Knowledge
Decision tree technology has proven to be a valuable way of capturing human decision making within a computer. It has long been a popular artificial intelligence(AI) technique. During the 1980s, it was one of the primary ways for creating an AI system. During the early part of the 1990s, it somewhat fell out of favor, as did the entire AI field in general. However, during the later 1990s, with ...
متن کاملCost-sensitive Decision Trees with Post-pruning and Competition for Numeric Data
Decision tree is an effective classification approach in data mining and machine learning. In some applications, test costs and misclassification costs should be considered while inducing decision trees. Recently, some cost-sensitive learning algorithms based on ID3, such as CS-ID3, IDX, ICET and λ-ID3, have been proposed to deal with the issue. In this paper, we develop a decision tree algorit...
متن کاملCost-sensitive C4.5 with post-pruning and competition
Decision tree is an effective classification approach in data mining and machine learning. In applications, test costs and misclassification costs should be considered while inducing decision trees. Recently, some cost-sensitive learning algorithms based on ID3 such as CS-ID3, IDX, λ-ID3 have been proposed to deal with the issue. These algorithms deal with only symbolic data. In this paper, we ...
متن کاملInduction of Modular Classification Rules: Using Jmax-pruning
The Prism family of algorithms induces modular classification rules which, in contrast to decision tree induction algorithms, do not necessarily fit together into a decision tree structure. Classifiers induced by Prism algorithms achieve a comparable accuracy compared with decision trees and in some cases even outperform decision trees. Both kinds of algorithms tend to overfit on large and nois...
متن کامل